Give a brief a description of your project and its goal(s), what data you are using to complete it, and what three faculty/staff in different fields you have spoken to about your project with a brief summary of what you learned from each person. Include a link to your final project GitHub repository.
This project aims at identifying predictors of health outcomes using socioeconomic factors. Specifically, income, housing costs, food security, and food costs will be tested for association with surveyed outocmes from Behavioral Risk Factor Surveillance System (BRFSS) epidemiologic data. BRFSS data is tracked at the metropolitan level, and SMART BRFSS 2017 data was downloaded as XPT file here Measures of mental and physical health will be analyzed using secondary data from the American Community Survey (ACS), Housing Urban Development (HUD) fair market rent (fmr) data, Feeding America dataset. The main faculty for this project is Blanca Himes and Sherry Xie, who assisted with geospatial plotting and formatting.
Describe the problem addressed, its significance, and some background to motivate the problem.
Do aggregate factors compare well to individual factors? This would be particularly useful since census data is extensive.
Explain why your problem is interdisciplinary, what fields can contribute to its understanding, and incorporate background related to what you learned from meeting with faculty/staff.
Describe the data used and general methodological approach. Subsequently, incorporate full R code necessary to retrieve and clean data, and perform analysis. Be sure to include a description of code so that others (including your future self) can understand what you are doing and why. Use 5-year ACS data from 2014-2018 for poverty rate and median income. Collect HUD fmr rate for the past from 2017-2020. Finally I use the “Map the Gap” calculated county-level meal cost to estimate food expense. The cost-of-food index is calculated by applying county-level sales tax rates to Nielsen basket-of-goods based on actual pounds purchased. This value was last calculated in 2010 by Feeding America. Necessary corrections to nominal housing costs for inflation were corrected using CPI values from 2010-2020. BRFSS data is ta
factors: poverty rate, median income, median income spent on food, median income spent on rent, race, sex, education level. Look into high versus low cost-of-living metropolitan areas.
Download and install Rtools40 to use the BLS api: https://cran.r-project.org/bin/windows/Rtools/
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
##
## Attaching package: 'tidyr'
## The following object is masked from 'package:magrittr':
##
## extract
## #Uighur
# tools for accessing BLS data
# install_github("mikeasilva/blsAPI", force = TRUE) # Install the blsAPI by Mike Silva
library(devtools)## Loading required package: usethis
# query data from BLS.gov
# load api key (must have the txt file)
api_key <- read.csv("bls_api_key.txt", header=FALSE)[[1]]
# signatures to be queried from BLS.gov, 10-year CPI data from major metropolitan areas
signatures <- read.csv("metro_area_cpi_signatures.csv")
# payload signatures to query BLS
month_payload <- list('seriesid'= signatures$signature_monthly) #2018-2020 monthly CPI
annual_payload <- list('seriesid'= signatures$signature_annual) #2018-2020 annual CPI
# query CPI 2018-2020 data from major metropolitan areas from BLS
months_cpi <- blsAPI(month_payload, return_data_frame = TRUE) # monthly CPI dataframe
annual_cpi <- blsAPI(annual_payload, return_data_frame = TRUE) # annual CPI dataframe# add columns to join region labels by
months_cpi$signature_monthly <- months_cpi$seriesID
annual_cpi$signature_annual <- annual_cpi$seriesID
# join region and state labels to the cpi data
# complete monthly dataframe for 2018-2020 regional metro cpi data
months_cpi %<>%
left_join(signatures[, c("signature_monthly","state","area")],by="signature_monthly") %>%
dplyr::select(-seriesID)
# complete annual dataframe for 2018-2020 regional metro cpi data
annual_cpi %<>%
left_join(signatures[, c("signature_annual","state","area")],by="signature_annual") %>%
dplyr::select(-seriesID)
head(months_cpi)## year period periodName value signature_monthly state area
## 1 2020 M10 October 260.892 CUURS35ESA0 MD Baltimore-Columbia-Towson
## 2 2020 M08 August 259.336 CUURS35ESA0 MD Baltimore-Columbia-Towson
## 3 2020 M06 June 257.942 CUURS35ESA0 MD Baltimore-Columbia-Towson
## 4 2020 M04 April 258.978 CUURS35ESA0 MD Baltimore-Columbia-Towson
## 5 2020 M02 February 259.121 CUURS35ESA0 MD Baltimore-Columbia-Towson
## 6 2019 M12 December 257.847 CUURS35ESA0 MD Baltimore-Columbia-Towson
## year period periodName value signature_annual state area
## 1 2020 S01 1st Half 258.891 CUUSS35ESA0 MD Baltimore-Columbia-Towson
## 2 2019 S02 2nd Half 257.288 CUUSS35ESA0 MD Baltimore-Columbia-Towson
## 3 2019 S01 1st Half 256.485 CUUSS35ESA0 MD Baltimore-Columbia-Towson
## 4 2018 S02 2nd Half 254.382 CUUSS35ESA0 MD Baltimore-Columbia-Towson
## 5 2018 S01 1st Half 252.401 CUUSS35ESA0 MD Baltimore-Columbia-Towson
## 6 2020 S01 1st Half 244.889 CUUSS35CSA0 GA Atlanta-Sandy Springs-Roswell
# Further format dataframes
# pivot data by metro area and years, add column for mean annual CPI
# annual cpi
a_cpi <- annual_cpi %>%
pivot_wider(id_cols = c("area","year"),names_from = "periodName",
values_from = "value")%>% # pivot on area and year using values from period
mutate_at(vars(3:4), as.numeric) %>% # change CPI values to numeric
mutate(annual = rowMeans(select(., c("1st Half", "2nd Half")), na.rm= TRUE))# mean annual cpi
# monthly cpi
m_cpi <- months_cpi %>%
dplyr::select(-period) %>%
pivot_wider(id_cols = c("area","year"), names_from = "periodName",
values_from = "value") %>% # pivot on area and year using values from period
mutate_at(vars(2:14), as.numeric) %>% # change CPI values to numeric
mutate(annual = rowMeans(select(.,-c("area", "year")), na.rm = TRUE)) # mean annual cpi
head(a_cpi)## # A tibble: 6 x 5
## area year `1st Half` `2nd Half` annual
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 Baltimore-Columbia-Towson 2020 259. NA 259.
## 2 Baltimore-Columbia-Towson 2019 256. 257. 257.
## 3 Baltimore-Columbia-Towson 2018 252. 254. 253.
## 4 Atlanta-Sandy Springs-Roswell 2020 245. NA 245.
## 5 Atlanta-Sandy Springs-Roswell 2019 242. 246. 244.
## 6 Atlanta-Sandy Springs-Roswell 2018 238. 239. 239.
## # A tibble: 6 x 15
## area year October August June April February December September July May March January November annual
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Baltimore-Columbia-Towson 2020 261. 259. 258. 259. 259. NA NA NA NA NA NA NA 259.
## 2 Baltimore-Columbia-Towson 2019 258. 257. 257. 259. 254. 258. NA NA NA NA NA NA 257.
## 3 Baltimore-Columbia-Towson 2018 255. 255. 254. 252. 252. 253. NA NA NA NA NA NA 254.
## 4 Atlanta-Sandy Springs-Roswell 2020 249. 248. 245. 243. 247. NA NA NA NA NA NA NA 246.
## 5 Atlanta-Sandy Springs-Roswell 2019 246. 246. 243. 243. 240. 245. NA NA NA NA NA NA 244.
## 6 Atlanta-Sandy Springs-Roswell 2018 239. 241. 240. 237. 237 237. NA NA NA NA NA NA 239.
a_cpi_growth_rate <- as.data.frame(a_cpi) %>%
dplyr::select(area, annual) %>%
dplyr::group_by(area) %>%
dplyr::summarise(growth = (max(annual)-min(annual))/min(annual)*100) #growth from 2018-2020## `summarise()` ungrouping output (override with `.groups` argument)
ggplot(a_cpi_growth_rate, aes(x = reorder(area, -growth), y = growth))+
geom_col() +
theme(axis.text.x = element_text(angle = 90)) +
theme(axis.title.x = element_blank())+
geom_hline(yintercept=2.2, linetype="dashed", color = "red")+
ylab("Growth (% CPI Change 2018-2020)")## Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
## Loading required package: grid
library(stringr)
library(grid)
# function to format the geo id column of FMR dataframes
format_id <- function(df){
df$ID <- str_replace(as.character(df$fips2010),
as.character(df$CouSub),"")
df$ID <- as.numeric(df$ID) + 50000000000
df$ID <- paste0("0",as.character(df$ID))
return(df)
}
# read in the housing costs: fair market rates (2017-2020)
# format relevent columns into a standard form
fmr_2017 <- read.csv("FY2017_50_County_FMR.csv", fileEncoding = "UTF-8-BOM") %>%
dplyr::mutate_at(c("fmr0", "fmr1", "fmr2", "fmr3", "fmr4"), as.numeric)
fmr_2018 <- read.csv("FY2018_50_County_FMR.csv", fileEncoding = "UTF-8-BOM") %>%
dplyr::mutate_at(c("fmr_0", "fmr_1", "fmr_2", "fmr_3", "fmr_4"), as.numeric) %>%
dplyr::rename("CouSub" = "cousub", "fmr0" = "fmr_0","fmr1" = "fmr_1","fmr2" = "fmr_2",
"fmr3" = "fmr_3", "fmr4" = "fmr_4", "Metro_code" = "metro_code")
fmr_2019 <- read.csv("FY2019_50_County_FMR.csv", fileEncoding = "UTF-8-BOM") %>%
dplyr::mutate_at(c("fmr_0", "fmr_1", "fmr_2", "fmr_3", "fmr_4"), as.numeric) %>%
dplyr::rename("CouSub" = "cousub","fmr0" = "fmr_0","fmr1" = "fmr_1","fmr2" = "fmr_2",
"fmr3" = "fmr_3", "fmr4" = "fmr_4", "Metro_code" = "metro_code")
fmr_2020 <- read.csv("FY2020_50_County_FMR.csv", fileEncoding = "UTF-8-BOM") %>%
dplyr::mutate_at(c("fmr_0", "fmr_1", "fmr_2", "fmr_3", "fmr_4"), as.numeric) %>%
dplyr::rename("CouSub" = "cousub", "fmr0" = "fmr_0","fmr1" = "fmr_1","fmr2" = "fmr_2",
"fmr3" = "fmr_3", "fmr4" = "fmr_4", "Metro_code" = "metro_code")
fmr_2021 <- read.csv("FY2021_50_County_FMR.csv", fileEncoding = "UTF-8-BOM") %>%
dplyr::mutate_at(c("rent50_0", "rent50_1", "rent50_2", "rent50_3", "rent50_4"),
as.numeric) %>%
dplyr::rename("CouSub" = "cousub", "fmr0" = "rent50_0",
"fmr1" = "rent50_1","fmr2" = "rent50_2",
"fmr3" = "rent50_3", "fmr4" = "rent50_4",
"Metro_code" = "cbsasub21", "areaname" = "areaname21",
"countyname" = "cntyname")
# read in shape file of US counties
counties <- readRDS(gzcon(url('https://raw.githubusercontent.com/HimesGroup/BMIN503/master/DataFiles/uscounties_2010.rds'))) # decompress rds file and read in shape file
counties$GEO_ID <- as.character(counties$GEO_ID)#change geo id to characters in counties df
counties$ID <- gsub("US","", counties$GEO_ID) #remove US from ID
# read in metro area shape file and convert into spatial components
setwd("tl_2017_us_cbsa")
cbsa.sf <- st_read("tl_2017_us_cbsa.shp")## Reading layer `tl_2017_us_cbsa' from data source `C:\Users\chana\Box Sync\Graduate School\Courses\Fall 2020\BMIN 503\Final Project\BMIN503_Final_Project\tl_2017_us_cbsa\tl_2017_us_cbsa.shp' using driver `ESRI Shapefile'
## Simple feature collection with 945 features and 12 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -178.4436 ymin: 17.83151 xmax: -65.42271 ymax: 65.45352
## geographic CRS: NAD83
cbsa.sf %<>%
dplyr::select(GEOID, geometry,NAME, NAMELSAD) %>%
dplyr::rename(ID2 = GEOID, Metro = NAMELSAD)
# function to merge FMR dataframes with county geometries
geo_format <- function(geo_df){
fmr_geo <- counties %>%
dplyr::select(c("NAME", "ID", "geometry")) %>%
left_join(geo_df[,c("ID","fmr0", "fmr1", "fmr2", "fmr3", "fmr4",
"Metro_code", "areaname", "countyname")], by="ID") %>%
distinct(Metro_code, .keep_all = TRUE)
}
# function to merge FMR dataframes with metro area geometries
geo_format2 <- function(geo_df){
geo_df %<>%
dplyr::select(ID,fmr0,fmr1,fmr2, fmr3, fmr4, Metro_code, areaname, countyname)%>%
distinct(Metro_code, .keep_all = TRUE) %>% # removed repeated counties
dplyr::mutate(ID2 = substr(Metro_code, 12,16)) # metro IDs
fmr_geo <- cbsa.sf %>%
dplyr::select(c("NAME", "ID2", "geometry", "Metro")) %>%
left_join(geo_df, by="ID2") # join by metro area
}
# format geographical IDs for all FMRs 2017-20201
fmr_2017 <- format_id(fmr_2017)
fmr_2018 <- format_id(fmr_2018)
fmr_2019 <- format_id(fmr_2019)
fmr_2020 <- format_id(fmr_2020)
fmr_2021 <- format_id(fmr_2021)
# add county geographical data to all FMRs 2017-20201
fmr_2017_geo <- geo_format(fmr_2017)
fmr_2018_geo <- geo_format(fmr_2018)
fmr_2019_geo <- geo_format(fmr_2019)
fmr_2020_geo <- geo_format(fmr_2020)
fmr_2021_geo <- geo_format(fmr_2021)
# add metro area geographical data to all FMRs 2017-20201
fmr_2017_geo2 <- geo_format2(fmr_2017)
fmr_2018_geo2 <- geo_format2(fmr_2018)
fmr_2019_geo2 <- geo_format2(fmr_2019)
fmr_2020_geo2 <- geo_format2(fmr_2020)
fmr_2021_geo2 <- geo_format2(fmr_2021)library(RColorBrewer)
library(leaflet)
my_theme <- function() {
theme_minimal() + # shorthand for white background color
theme(axis.line = element_blank(), # further customization of theme components
axis.text = element_blank(), # remove x and y axis text and labels
axis.title = element_blank(),
panel.grid = element_line(color = "white"), # make grid lines invisible
legend.key.size = unit(0.8, "cm"), # increase size of legend
legend.text = element_text(size = 16), # increase legend text size
legend.title = element_text(size = 16),
plot.title = element_text(hjust = 0.5)) # increase legend title size
}
myPalette <- colorRampPalette(brewer.pal(9, "BuPu")) # Blue-Purple
myPalette2 <- colorRampPalette(brewer.pal(9, "YlOrRd")) # Yellow-Orange-Red
myPalette3 <- colorRampPalette(brewer.pal(9, "YlGnBu")) # Yellow-Green-Blue
# geographic visualizations of housing costs
cpi_by_year <- read.csv("annual_cpi_us.csv", fileEncoding = "UTF-8-BOM")
cpi_by_year$adjust <- cpi_by_year$cpi/258.2937 # adjustment factor into 2020 dollars
# use the cpi_by_year adjustment to adjust rents to 2010 dollars
# rent in y year = (rent in current year) X (CPI y year/CPI current year)
# Select a color palette with which to run the palette function
pal_fun <- colorNumeric("BuPu", NULL) # Blue-Purple from RColorBrewer
pal_fun2 <- colorNumeric("YlOrRd", NULL) # Yellow-Orange-Red from RColorBrewer
# function to get inflation correction factor for a given year into 2020 dollars
i_factor <- function(year){
c_factor <- cpi_by_year$adjust[which(cpi_by_year$year == year)]
return(c_factor)
}
# function to generate static map of rent costs by year across all coutnies in US
map_rent <- function(df, year){
correction = i_factor(year)
fmr_map = ggplot() +
geom_sf(data=df,lwd=0,
aes(fill = fmr1/correction)) + # adjust factor included
ggtitle(paste(year, "Median Rent for 1-bed")) +
scale_fill_gradientn(name = "Rent (2020 $)",
colours = myPalette2(100),
limits = c(0,2000),
breaks = c(500,1000,1500,2000)) + # add limits and breaks to normalize scaling between years
north(data = df, symbol = 12, location = "bottomright") +
scalebar(data = df, dist = 500, dist_unit = 'mi', transform = TRUE,
location = "bottomleft",
st.dist = 0.1) +
my_theme()
}
# call static map function and display
fmr1_2017_map <- map_rent(fmr_2017_geo, 2017) # generate static rent map for counties
fmr1_2017_map # display rent map# call static map function and display
fmr1_2017_map2 <- map_rent(fmr_2017_geo2, 2017) # generate static rent map for metro areas
fmr1_2017_map2 # display rent map# an interactive map of rent prices
library(htmlwidgets)
library(htmltools)
pal_fun <- colorNumeric("YlOrRd", NULL) # Yellow-Orange-Red from RColorBrewer
# ------------------------------------------------- #
# HTML code to fix misaligned "na" in map legend
# ------------------------------------------------- #
library(htmlwidgets)
css_fix <- "div.info.legend.leaflet-control br {clear: both;}" # CSS to correct spacing
html_fix <- htmltools::tags$style(type = "text/css", css_fix) # Convert CSS to HTML
# Funciton allows to plot interactive leaflet maps of median US rents by year
map_rent_i <- function(df, year){
#---------------------------------------------------#
# HTML code to add title to interactive maps
#---------------------------------------------------#
tag.map.title <- tags$style(HTML("
.leaflet-control.map-title {
transform: translate(-50%,20%);
position: fixed !important;
left: 50%;
text-align: center;
padding-left: 10px;
padding-right: 10px;
background: rgba(255,255,255,0.75);
font-weight: bold;
color: #000000;
font-size: 20px;
}
"))
title <- tags$div(
tag.map.title, HTML(paste(year, "Median Rent:", 1, "Bed"))
)
correction = i_factor(year)# find correction factor
# Pop-up message
pu_message <- paste0("County: ", df$countyname,
"<br>Rent: ", "$", floor(df$fmr1/correction)) # in 2010 dollars
# Adding more customization
interactive_map <- leaflet(df) %>%
addPolygons(stroke = FALSE, # remove polygon borders
fillColor = ~pal_fun(fmr1/correction), # in 2010 dollars
fillOpacity = 0.5, smoothFactor = 0.5, # increase opacity and resolution
popup = ~pu_message) %>%
addProviderTiles(providers$CartoDB.Positron) %>% # add third party provider tile
#addProviderTiles(providers$Stamen.Toner) %>%
#addProviderTiles(providers$Esri.NatGeoWorldMap)
addLegend("bottomright", # location of legend
pal=pal_fun, # palette function
values=~fmr1/correction, # variable to be passed to palette function
title = paste("Rent",1,"bed (2020 $)"), # legend title
opacity = 1) %>% # legend opacity (1 = completely opaque)
addScaleBar() %>%
addTiles() %>%
addControl(title, position = "topleft", className="map-title") #add title panel
return(interactive_map %<>% htmlwidgets::prependContent(html_fix))
}
# Call interactive map function and display
fmr_2020_mapi <- map_rent_i(fmr_2020_geo, 2020)
fmr_2020_mapilibrary(tidycensus)
library(tidyr)
census_key <- read.delim("census_key.txt", header=FALSE)[1,1] # read in api key from local
# census_api_key(census_key,install=TRUE)acs.data <- get_acs(geography = "county", # query data at the county level
year = 2018, # end year (these will give us ACS 5-year estimates for 2014-2018)
variables = c("B17010_002", # number of families falling below the poverty threshold
"B17010_001", #total number of families for which poverty was determined
"B19013_001"), # median household income
key = census_key) ## Getting data from the 2014-2018 5-year ACS
acs.data %<>%
dplyr::select(-moe) #remove the margin of error
# pivot on both estimate and total variables
acs.data <- pivot_wider(data=acs.data, names_from="variable", values_from="estimate")
acs.data %<>%
dplyr::rename(poverty_total = B17010_001,
poverty_estimate = B17010_002,
med_house_income = B19013_001) %>% # give vars meaningful names
dplyr::mutate(ID = paste0("0", as.character(050000000000 + as.numeric(GEOID)))) #create ID column to merge on# geospatial exploration of ACS median income from 2014-2018 by county
# join acs data
acs.data.geo <- left_join(counties, acs.data[,c("ID", "poverty_total",
"poverty_estimate", "med_house_income")],
by = "ID") # merge to counties
acs.data.geo %<>% # clean up the dataframe
dplyr::select(GEO_ID, ID, COUNTY, poverty_total, poverty_estimate,
med_house_income,geometry)
# plot income map
income_map <- ggplot() +
geom_sf(data=acs.data.geo,lwd=0,
aes(fill = med_house_income)) + # adjust factor included
ggtitle("Median Household Income (2014-2018)") +
scale_fill_gradientn(name = "Income",
colours = myPalette2(100)) + # add limits and breaks to normalize scaling between years
north(data = acs.data.geo, location = "bottomright", symbol = 12) +
scalebar(data = acs.data.geo, dist = 500, dist_unit = 'mi', transform = TRUE,
location = "bottomleft",
st.dist = 0.1) +
my_theme()
income_map # show the median income maplibrary(ggpubr)
options(scipen=10000)
# joins acs data and fmr data for any given fmr year
acs_join <- function(acs.geo.df, fmr.geo.df){
acs.df <- dplyr::select(as.data.frame(acs.data.geo), -geometry)
acs_fmr <- left_join(fmr.geo.df, acs.df, by = "ID")
return(acs_fmr)
}
# combine acs data with fmr data
acs.fmr.2017 <- acs_join(acs.data.geo, fmr_2017_geo)
acs.fmr.2018 <- acs_join(acs.data.geo, fmr_2018_geo)
acs.fmr.2019 <- acs_join(acs.data.geo, fmr_2019_geo)
acs.fmr.2020 <- acs_join(acs.data.geo, fmr_2020_geo)
acs.fmr.2021 <- acs_join(acs.data.geo, fmr_2021_geo) # calculate poverty rate and percent spent on housing by county for one year
thresh <- 0 # threshold for filtering poverty estimates
acs.fmr.2017 %<>%
dplyr::filter(poverty_estimate > thresh) %>%
dplyr::mutate(poverty_percent = poverty_estimate/poverty_total) %>%
dplyr::mutate(housing_percent = (fmr1*12)/med_house_income)
acs.fmr.2018 %<>%
dplyr::filter(poverty_estimate > thresh) %>%
dplyr::mutate(poverty_percent = poverty_estimate/poverty_total) %>%
dplyr::mutate(housing_percent = (fmr1*12)/med_house_income)
acs.fmr.2019 %<>%
dplyr::filter(poverty_estimate > thresh) %>%
dplyr::mutate(poverty_percent = poverty_estimate/poverty_total) %>%
dplyr::mutate(housing_percent = (fmr1*12)/med_house_income)
acs.fmr.2020 %<>%
dplyr::filter(poverty_estimate > thresh) %>%
dplyr::mutate(poverty_percent = poverty_estimate/poverty_total) %>%
dplyr::mutate(housing_percent = (fmr1*12)/med_house_income)
# plot map of percentage spent on median 1-bed apartment by median income
rent_income_map <- ggplot() +
geom_sf(data=acs.fmr.2017,lwd=0,
aes(fill = housing_percent)) + # adjust factor included
ggtitle("Percent of income spent on rent (2017 1-Bed)") +
scale_fill_gradientn(name = "% spent on rent",
colours = myPalette2(100)) + # add limits and breaks to normalize scaling between years
north(data = acs.fmr.2017, location = "bottomright", symbol = 12) +
scalebar(data = acs.fmr.2017, dist = 500, dist_unit = 'mi', transform = TRUE,
location = "bottomleft",
st.dist = 0.1) +
my_theme()
rent_income_map # show the median income map# relationship between amount spent on housing and poverty rate
# FMR 1-bed data from 2017-2020
# Median income estimated from 2014-2018 ACS data
# generate linear regression fits for the relationship between housing burdern and poverty
# looking at income alone across US counties
income_2017 <- ggplot(data = acs.fmr.2017, aes(x=med_house_income, y=poverty_percent))+
geom_point() +
xlab("Median Income ($)") +
ylab("Poverty Rate %") +
ggtitle("2017 US Counties ") +
theme(plot.title = element_text(hjust = 0.5))
income_2018 <- ggplot(data = acs.fmr.2018, aes(x=med_house_income, y=poverty_percent))+
geom_point() +
xlab("Median Income ($)") +
ylab("Poverty Rate %") +
ggtitle("2018 US Counties ") +
theme(plot.title = element_text(hjust = 0.5))
income_2019 <- ggplot(data = acs.fmr.2019, aes(x=med_house_income, y=poverty_percent))+
geom_point() +
xlab("Median Income ($)") +
ylab("Poverty Rate %") +
ggtitle("2019 US Counties ") +
theme(plot.title = element_text(hjust = 0.5))
income_2020 <- ggplot(data = acs.fmr.2020, aes(x=med_house_income, y=poverty_percent))+
geom_point() +
xlab("Median Income ($)") +
ylab("Poverty Rate %") +
ggtitle("2020 US Counties ") +
theme(plot.title = element_text(hjust = 0.5))
combined_income <- cowplot::plot_grid(income_2017,income_2018,
income_2019,income_2020)
combined_income# looking at housing cost alone across US counties
housing_2017 <- ggplot(data = acs.fmr.2017, aes(x=fmr1, y=poverty_percent))+
geom_point() +
xlab("Median Rent (1-Bed)") +
ylab("Poverty Rate %") +
ggtitle("2017 US Counties ") +
theme(plot.title = element_text(hjust = 0.5))
housing_2018 <- ggplot(data = acs.fmr.2018, aes(x=fmr1, y=poverty_percent))+
geom_point() +
xlab("Median Rent (1-Bed)") +
ylab("Poverty Rate %") +
ggtitle("2018 US Counties ") +
theme(plot.title = element_text(hjust = 0.5))
housing_2019 <- ggplot(data = acs.fmr.2019, aes(x=fmr1, y=poverty_percent))+
geom_point() +
xlab("Median Rent (1-Bed)") +
ylab("Poverty Rate %") +
ggtitle("2019 US Counties ") +
theme(plot.title = element_text(hjust = 0.5))
housing_2020 <- ggplot(data = acs.fmr.2020, aes(x=fmr1, y=poverty_percent))+
geom_point() +
xlab("Median Rent (1-Bed)") +
ylab("Poverty Rate %") +
ggtitle("2020 US Counties ") +
theme(plot.title = element_text(hjust = 0.5))
combined_housing <- cowplot::plot_grid(housing_2017, housing_2018,
housing_2019, housing_2020)
combined_housing# looking at income and housing cost combined across US counties
h_burden_2017 <- ggplot(data = acs.fmr.2017, aes(x=housing_percent, y=poverty_percent))+
geom_point() +
geom_smooth(method = "lm", formula = y~x) +
xlab("Percent Spent on Rent") +
ylab("Poverty Rate %") +
ggtitle("2017 US Counties ") +
theme(plot.title = element_text(hjust = 0.5)) +
stat_regline_equation()
h_burden_2018 <- ggplot(data = acs.fmr.2018, aes(x=housing_percent, y=poverty_percent))+
geom_point() +
geom_smooth(method = "lm", formula = y~x) +
xlab("Percent Spent on Rent") +
ylab("Poverty Rate %") +
ggtitle("2018 US Counties ") +
theme(plot.title = element_text(hjust = 0.5)) +
stat_regline_equation()
h_burden_2019 <-ggplot(data = acs.fmr.2019, aes(x=housing_percent, y=poverty_percent))+
geom_point() +
geom_smooth(method = "lm", formula = y~x) +
xlab("Percent Spent on Rent") +
ylab("Poverty Rate %") +
ggtitle("2019 US Counties ") +
theme(plot.title = element_text(hjust = 0.5)) +
stat_regline_equation()
h_burden_2020 <-ggplot(data = acs.fmr.2020, aes(x=housing_percent, y=poverty_percent))+
geom_point() +
geom_smooth(method = "lm", formula = y~x) +
xlab("Percent Spent on Rent") +
ylab("Poverty Rate %") +
ggtitle("2020 US Counties ") +
theme(plot.title = element_text(hjust = 0.5)) +
stat_regline_equation()
combined_h_burden <- cowplot::plot_grid(h_burden_2017, h_burden_2018,
h_burden_2019, h_burden_2020)
combined_h_burden # display the combined graphsetwd("Map the Meal Gap data")
# map the gap county data 2010
food <- read.csv("MMG2012_2010Data_ToShare.csv", fileEncoding = "UTF-8-BOM")
food %<>%
dplyr::select(FIPS, State,County..State,X2010.Food.Insecurity.Rate,X2010.Cost.Per.Meal) %>%
dplyr::rename(County = County..State, food_insecure = X2010.Food.Insecurity.Rate,
meal_cost = X2010.Cost.Per.Meal) %>%
dplyr::mutate(ID = paste0("0", as.character(050000000000 + as.numeric(FIPS))))
food$meal_cost = as.numeric(gsub("[$]","",food$meal_cost))## Warning: NAs introduced by coercion
# join map the gap data to acs/fmr data
acs.fmr.food.2017 <- left_join(acs.fmr.2017, food[,c("ID","food_insecure",
"meal_cost")], by = "ID") %>%
dplyr::mutate(meal_percent = meal_cost*3*365/med_house_income)
acs.fmr.food.2018 <- left_join(acs.fmr.2018, food[,c("ID","food_insecure",
"meal_cost")], by = "ID") %>%
dplyr::mutate(meal_percent = meal_cost*3*365/med_house_income)
acs.fmr.food.2019 <- left_join(acs.fmr.2019, food[,c("ID","food_insecure",
"meal_cost")], by = "ID") %>%
dplyr::mutate(meal_percent = meal_cost*3*365/med_house_income)
acs.fmr.food.2020 <- left_join(acs.fmr.2020, food[,c("ID","food_insecure",
"meal_cost")], by = "ID") %>%
dplyr::mutate(meal_percent = meal_cost*3*365/med_house_income)
# look at food costs in isolation
food_cost <- ggplot(acs.fmr.food.2017, aes(x=meal_cost, y=poverty_percent))+
geom_point()+
xlab("Meal Cost") +
ylab("Poverty Rate %") +
ggtitle("Meal Cost") +
theme(plot.title = element_text(hjust = 0.5))
# look at food cost as percent of median income
food_burden <- ggplot(acs.fmr.food.2017, aes(x=meal_percent, y=poverty_percent))+
geom_point()+
geom_smooth(method = "lm", formula = y~x) +
xlab("Percent Spent on Food") +
ylab("Poverty Rate %") +
ggtitle("Income spent on Food") +
theme(plot.title = element_text(hjust = 0.5)) +
stat_regline_equation()
combined_food_burden <- cowplot::plot_grid(food_cost, food_burden)
combined_food_burden # display the combined graph# generate a two-factor cost of living index by adding food and housing costs
acs.fmr.food.2017 %<>%
mutate(COL = meal_percent + housing_percent)
acs.fmr.food.2018 %<>%
mutate(COL = meal_percent + housing_percent)
acs.fmr.food.2019 %<>%
mutate(COL = meal_percent + housing_percent)
acs.fmr.food.2020 %<>%
mutate(COL = meal_percent + housing_percent)
COL_2017 <- ggplot(acs.fmr.food.2017, aes(x=COL, y=poverty_percent))+
geom_point()+
geom_smooth(method = "lm", formula = y~x) +
xlab("Combined food and housing costs") +
ylab("Poverty Rate %") +
ggtitle("2017 US Counties ") +
theme(plot.title = element_text(hjust = 0.5)) +
stat_regline_equation()
COL_2018 <- ggplot(acs.fmr.food.2018, aes(x=COL, y=poverty_percent))+
geom_point()+
geom_smooth(method = "lm", formula = y~x) +
xlab("Combined food and housing costs") +
ylab("Poverty Rate %") +
ggtitle("2018 US Counties ") +
theme(plot.title = element_text(hjust = 0.5)) +
stat_regline_equation()
COL_2019 <- ggplot(acs.fmr.food.2019, aes(x=COL, y=poverty_percent))+
geom_point()+
geom_smooth(method = "lm", formula = y~x) +
xlab("Combined food and housing costs") +
ylab("Poverty Rate %") +
ggtitle("2019 US Counties ") +
theme(plot.title = element_text(hjust = 0.5)) +
stat_regline_equation()
COL_2020 <- ggplot(acs.fmr.food.2020, aes(x=COL, y=poverty_percent))+
geom_point()+
geom_smooth(method = "lm", formula = y~x) +
xlab("Combined food and housing costs") +
ylab("Poverty Rate %") +
ggtitle("2020 US Counties ") +
theme(plot.title = element_text(hjust = 0.5)) +
stat_regline_equation()
combined_COL <- cowplot::plot_grid(COL_2017,COL_2018,COL_2019,COL_2020)
combined_COL # display the combined graphBased on these analyses, poverty rate and income are directly correlated as expected. However, median housing cost and food expenses alone do not seem to be directly associated with the poverty rate. Furthermore, calculated variables such as percent-spent-on-housing and percent-spent-on-food do not seem to give better correlations to poverty rate compared to median income alone, From these results I will use housing costs, food costs, and either poverty rate or median income from aggreate county data to predict health outcomes from BRFSS data.
## --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
##
## Attaching package: 'plyr'
## The following object is masked from 'package:ggpubr':
##
## mutate
## The following objects are masked from 'package:dplyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise, summarize
pov_quant_2017 <- acs.fmr.food.2017 %>%
dplyr::mutate(pov_quant = quantcut(poverty_percent, q=4,by=0.25, na.rm=TRUE))
pov_meal_quart <- ggplot(data=pov_quant_2017, aes(x=med_house_income,
y=meal_cost,
color = pov_quant))+
geom_point() +
xlab("Median Household Income ($)") +
ylab("Meal Cost ($)") +
ggtitle("US Counties 2017") +
theme(plot.title = element_text(hjust = 0.5))
pov_meal_quartpov_house_quart <- ggplot(data=pov_quant_2017, aes(x=med_house_income,
y=fmr1,
color = pov_quant))+
geom_point() +
xlab("Median Household Income ($)") +
ylab("Median Rent ($)") +
ggtitle("US Counties 2017") +
theme(plot.title = element_text(hjust = 0.5))
pov_house_quart# select variables of interest
# clean BRFSS 2017 dataframe
brfss2017 <- brfss2017_raw %>%
dplyr::select('MMSANAME', 'X_MMSA', 'X_RACE', 'SEX', 'X_AGEG5YR','EDUCA','INCOME2',
'ADDEPEV2', 'CHCCOPD1', 'X_RFHLTH') %>%
dplyr::rename('Metro Area'='MMSANAME','Metro_ID'='X_MMSA','age'='X_AGEG5YR','race'='X_RACE',
'sex'='SEX', 'education'='EDUCA','income'='INCOME2',
'depression'='ADDEPEV2','copd'='CHCCOPD1', 'health'='X_RFHLTH') %>%
dplyr::filter(race %in% c(1,2,3,5,8)) %>%
dplyr::filter(sex %in% c(1,2)) %>%
dplyr::filter(age %in% seq(13)) %>%
dplyr::filter(education %in% c(1,2,3,4,5,6)) %>%
dplyr::filter(income %in% c(1,2,3,4,5,6,7,8)) %>%
dplyr::filter(depression %in% c(1,2)) %>%
dplyr::filter(copd %in% c(1,2)) %>%
dplyr::filter(health %in% c(1,2))
brfss2017 %<>%
dplyr::mutate('Metro_ID' = as.character(brfss2017$'Metro_ID')) %>%
dplyr::mutate(race = factor(race, levels = c(1,2,3,5,8),
labels = c( "White", "Black", "American Indian",
"Asian", "Hispanic"))) %>%
dplyr::mutate(sex = factor(sex, levels = c(1,2),
labels = c("Male", "Female"))) %>%
dplyr::mutate(age = factor(age, levels = seq(13),
labels = c("18-24","25-29","30-34","35-39","40-44","45-49","50-54",
"55-59","60-64","65-69","70-74","75-29",">79"))) %>%
dplyr::mutate(education = factor(education, levels = c(1,2,3,4,5,6),
labels = c("None","Elementary", "Some HS", "HS",
"Some College", "College"))) %>%
dplyr::mutate(income = factor(income, levels = c(1,2,3,4,5,6,7,8),
labels = c("10k","15k", "20k", "25k", "35k", "50k", "75k","> 75k"))) %>%
dplyr::mutate(depression = factor(depression, levels = c(1,2),
labels = c("yes", "no"))) %>%
dplyr::mutate(copd = factor(copd, levels = c(1,2),
labels = c("yes", "no"))) %>%
dplyr::mutate(health = factor(health, levels = c(1,2),
labels = c("good", "poor"))) Let’s look at the summary of the clean 2017 data frame. I am interested in sex, age, income, education, and race as demographic factors. I am interested in previous depression, COPD, and general health as outcomes. From the summary, there is a bias in sex and age. This might be explained by the fact that BRFSS is a phone based survey. It seems that income and education is skewed twoards highly educated, high earning individuals as well. The BRFSS might be biased towards home-owners and landline users over cell-phone users.
## Metro Area Metro_ID race sex age education income depression copd health
## Length:176937 Length:176937 White :138185 Male :79948 18-24: 9372 None : 210 10k : 7348 yes: 35853 yes: 13249 good:146650
## Class :character Class :character Black : 18934 Female:96989 25-29: 9552 Elementary : 3202 15k : 7613 no :141084 no :163688 poor: 30287
## Mode :character Mode :character American Indian: 1806 30-34:10675 Some HS : 6744 20k :11613
## Asian : 385 35-39:11531 HS :41713 25k :14631
## Hispanic : 17627 40-44:11104 Some College:48317 35k :17058
## 45-49:13113 College :76751 50k :23933
## 50-54:15770 75k :27922
## 55-59:18469 > 75k:66819
## 60-64:19574
## 65-69:19402
## 70-74:15710
## 75-29:10526
## >79 :12139
cbsa2fips <- read.csv("cbsa2fipsxw.csv")
cbsa2fips %<>%
dplyr::mutate(fipsstatecode = as.character(cbsa2fips$fipsstatecode)) %>%
dplyr::mutate(fipscountycode = as.character(cbsa2fips$fipscountycode)) %>%
dplyr::mutate(fips_state = ifelse(nchar(fipsstatecode) == 1, paste0("0",fipsstatecode),
fipsstatecode)) %>%
dplyr::mutate(fips_county = ifelse(nchar(fipscountycode) == 2, paste0("0",fipscountycode),
ifelse(nchar(fipscountycode) == 1, paste0("00",fipscountycode), fipscountycode))) %>%
dplyr::mutate(ID = paste0("0", as.character(050000000000 + as.numeric(paste0(fips_state,fips_county))))) %>%
dplyr::rename('Metro_ID' = cbsacode) %>%
dplyr::select(ID, 'Metro_ID')
cbsa2fips$`Metro_ID` = as.character(cbsa2fips$`Metro_ID`)
# associate all counties to given metro area
acs.fmr.food.2017 %<>%
left_join(cbsa2fips, by="ID")# function to aggregate all acs.fmr.food cost of living data by metro area
# returns mean for all counties in a given metro area
metro_costs <- function(df){
df %<>%
dplyr::select(fmr0, fmr1, fmr2, fmr3, fmr4, Metro_ID, poverty_percent,
housing_percent, meal_cost, meal_percent, med_house_income)
df = as.data.frame(df)
df %<>%
dplyr::select(-geometry) %>%
group_by(Metro_ID) %>%
summarise_at(vars('fmr0', 'fmr1', 'fmr2', 'fmr3', 'fmr4', 'poverty_percent',
'housing_percent', 'meal_cost', 'meal_percent',
'med_house_income'), mean)
return(df)
}
col_2017 <- metro_costs(acs.fmr.food.2017)
# join 2017 metro cost of living to BRFSS2017 dataframe
brfss2017 %<>%
left_join(col_2017, by = "Metro_ID")
# remove NAs
brfss2017 %<>%
na.omit(brfss2017)metro_income_ghealth <- ggplot(data=brfss2017, aes(x=health, y = med_house_income))+
geom_boxplot() +
xlab("General Health") +
ylab("Median Household Income ($)") +
ggtitle("BRFSS2017 General Health") +
theme(plot.title = element_text(hjust = 0.5))
metro_income_copd <- ggplot(data=brfss2017, aes(x=copd, y = med_house_income))+
geom_boxplot() +
xlab("COPD History") +
ylab("Median Household Income ($)") +
ggtitle("BRFSS2017 COPD History") +
theme(plot.title = element_text(hjust = 0.5))
metro_income_depression <- ggplot(data=brfss2017, aes(x=depression, y = med_house_income))+
geom_boxplot() +
xlab("Depression History") +
ylab("Median Household Income ($)") +
ggtitle("BRFSS2017 Depression") +
theme(plot.title = element_text(hjust = 0.5))
metro_income_health_outcomes <- cowplot::plot_grid(metro_income_ghealth,metro_income_copd,
metro_income_depression)
metro_income_health_outcomesmetro_fmr1_ghealth <- ggplot(data=brfss2017, aes(x=health, y = fmr1))+
geom_boxplot() +
xlab("General Health") +
ylab("Median 1-Bed Rent ($)") +
theme(plot.title = element_text(hjust = 0.5))
metro_fmr1_copd <- ggplot(data=brfss2017, aes(x=copd, y = fmr1))+
geom_boxplot() +
xlab("COPD History") +
ylab("Median 1-Bed Rent ($)") +
theme(plot.title = element_text(hjust = 0.5))
metro_fmr1_depression <- ggplot(data=brfss2017, aes(x=depression, y = fmr1))+
geom_boxplot() +
xlab("Depression History") +
ylab("Median 1-Bed Rent ($)") +
theme(plot.title = element_text(hjust = 0.5))
metro_rent_health_outcomes <- cowplot::plot_grid(metro_fmr1_ghealth,metro_fmr1_copd,
metro_fmr1_depression)
metro_rent_health_outcomesmetro_food_ghealth <- ggplot(data=brfss2017, aes(x=health, y = meal_cost))+
geom_boxplot() +
xlab("General Health") +
ylab("Cost of Meal ($)") +
theme(plot.title = element_text(hjust = 0.5))
metro_food_copd <- ggplot(data=brfss2017, aes(x=copd, y = meal_cost))+
geom_boxplot() +
xlab("COPD History") +
ylab("Cost of Meal ($)") +
theme(plot.title = element_text(hjust = 0.5))
metro_food_depression <- ggplot(data=brfss2017, aes(x=depression, y = meal_cost))+
geom_boxplot() +
xlab("Depression History") +
ylab("Cost of Meal ($)") +
theme(plot.title = element_text(hjust = 0.5))
metro_food_health_outcomes <- cowplot::plot_grid(metro_food_ghealth,metro_food_copd,
metro_food_depression)
metro_food_health_outcomesmetro_pov_ghealth <- ggplot(data=brfss2017, aes(x=health, y = poverty_percent))+
geom_boxplot() +
xlab("General Health") +
ylab("Poverty Rate") +
theme(plot.title = element_text(hjust = 0.5))
metro_pov_copd <- ggplot(data=brfss2017, aes(x=copd, y = poverty_percent))+
geom_boxplot() +
xlab("COPD History") +
ylab("Poverty Rate") +
theme(plot.title = element_text(hjust = 0.5))
metro_pov_depression <- ggplot(data=brfss2017, aes(x=depression,
y= poverty_percent))+
geom_boxplot() +
xlab("Depression History") +
ylab("Poverty Rate") +
theme(plot.title = element_text(hjust = 0.5))
metro_pov_health_outcomes <- cowplot::plot_grid(metro_pov_ghealth,metro_pov_copd,
metro_pov_depression)
metro_pov_health_outcomes# income and general health
ggplot(data=brfss2017, aes(x=income, fill=health)) +
geom_bar(position="dodge") +
ggtitle('Individual Income and General Health')+
xlab('Income')+
ylab('Count') +
theme(plot.title = element_text(hjust = 0.5)) # income and depression
ggplot(data=brfss2017, aes(x=income, fill=depression)) +
geom_bar(position="dodge") +
ggtitle('Individual Income and Depression History')+
xlab('Income')+
ylab('Count') +
theme(plot.title = element_text(hjust = 0.5)) # income and COPD
ggplot(data=brfss2017, aes(x=income, fill=copd)) +
geom_bar(position="dodge") +
ggtitle('Individual Income and COPD History')+
xlab('Income')+
ylab('Count') +
theme(plot.title = element_text(hjust = 0.5)) health outcomes.
# chi-square test for individual income and general health
brfss2017_health_table <- table(brfss2017$health, brfss2017$income)
brfss2017_health_table##
## 10k 15k 20k 25k 35k 50k 75k > 75k
## good 3024 3345 5988 8708 10974 16542 20359 47235
## poor 2467 2603 3159 3343 3118 3340 2690 3325
##
## Pearson's Chi-squared test
##
## data: brfss2017_health_table
## X-squared = 13558, df = 7, p-value < 0.00000000000000022
# chi-square test for individual income and depression history
brfss2017_depression_table <- table(brfss2017$depression, brfss2017$income)
brfss2017_depression_table##
## 10k 15k 20k 25k 35k 50k 75k > 75k
## yes 2083 2196 2635 3141 3157 4095 4466 7479
## no 3408 3752 6512 8910 10935 15787 18583 43081
##
## Pearson's Chi-squared test
##
## data: brfss2017_depression_table
## X-squared = 3625.9, df = 7, p-value < 0.00000000000000022
# chi-square test for individual income and COPD history
brfss2017_COPD_table <- table(brfss2017$copd, brfss2017$income)
brfss2017_COPD_table##
## 10k 15k 20k 25k 35k 50k 75k > 75k
## yes 988 1234 1360 1492 1374 1630 1343 1553
## no 4503 4714 7787 10559 12718 18252 21706 49007
##
## Pearson's Chi-squared test
##
## data: brfss2017_COPD_table
## X-squared = 4926.9, df = 7, p-value < 0.00000000000000022
# Train a logistic regression model using BRFSS2017 data
depression_glm <- glm(formula=depression ~ race + sex + age + education + income,
data=brfss2017, family = binomial)
health_glm <- glm(formula=health ~ race + sex + age + education + income,
data=brfss2017, family = binomial(logit))
copd_glm <- glm(formula=copd ~ race + sex + age + education + income,
data=brfss2017, family = binomial(logit))library(PRROC)
# ROC for depression logistic regression
depression_pred <- predict(depression_glm, data=brfss2017, type="response")
depression_glm_roc <- roc.curve(depression_pred[brfss2017$depression == "no"],
depression_pred[brfss2017$depression == "yes"], curve=TRUE)
# ROC for general health logistic regression
health_pred <- predict(health_glm, data=brfss2017, type="response")
health_glm_roc <- roc.curve(health_pred[brfss2017$health == "poor"],
health_pred[brfss2017$health == "good"], curve=TRUE)
# ROC for COPD logistic regression
copd_pred <- predict(copd_glm, data=brfss2017, type="response")
copd_glm_roc <- roc.curve(copd_pred[brfss2017$copd == "no"],
copd_pred[brfss2017$copd == "yes"], curve=TRUE)
plot(depression_glm_roc)# Train a logistic regression model using BRFSS2017 data
depression_glm2 <- glm(formula=depression ~ race + sex + age + education + income +
med_house_income + fmr1 + meal_cost,
data=brfss2017, family = binomial(logit))
health_glm2 <- glm(formula=health ~ race + sex + age + education + income + med_house_income
+ fmr1 + meal_cost,
data=brfss2017, family = binomial(logit))
copd_glm2 <- glm(formula=copd ~ race + sex + age + education + income + med_house_income
+ fmr1 + meal_cost,
data=brfss2017, family = binomial(logit))# ROC for depression logistic regression
depression_pred2 <- predict(depression_glm2, data=brfss2017, type="response")
depression_glm_roc2 <- roc.curve(depression_pred2[brfss2017$depression == "no"],
depression_pred2[brfss2017$depression == "yes"], curve=TRUE)
head(depression_pred2)## 1 2 3 4 5 6
## 0.7964293 0.8271381 0.6311824 0.5844652 0.8867128 0.8109481
# ROC for general health logistic regression
health_pred2 <- predict(health_glm2, data=brfss2017, type="response")
health_glm_roc2 <- roc.curve(health_pred2[brfss2017$health == "poor"],
health_pred2[brfss2017$health == "good"], curve=TRUE)
# ROC for COPD logistic regression
copd_pred2 <- predict(copd_glm2, data=brfss2017, type="response")
copd_glm_roc2 <- roc.curve(copd_pred2[brfss2017$copd == "no"],
copd_pred2[brfss2017$copd == "yes"], curve=TRUE)
plot(depression_glm_roc2)